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Abstract 

The popularity of online social media platforms pro¬ 
vides an unprecedented opportunity to study real-world 
complex networks of interactions. However, releasing 
this data to researchers and the public comes at the cost 
of potentially exposing private and sensitive user in¬ 
formation. It has been shown that a naive anonymiza¬ 
tion of a network by removing the identity of the nodes 
is not sufficient to preserve users’ privacy. In order to 
deal with malicious attacks, fc-anonymity solutions have 
been proposed to partially obfuscate topological infor¬ 
mation that can be used to infer nodes’ identity. 

In this paper, we study the problem of ensuring k- 
anonymity in time-varying graphs, i.e., graphs with a 
structure that changes over time, and multi-layer graphs, 
i.e., graphs with multiple types of links. More specif¬ 
ically, we examine the case in which the attacker has 
access to the degree of the nodes. The goal is to gen¬ 
erate a new graph where, given the degree of a node in 
each (temporal) layer of the graph, such a node remains 
indistinguishable from other k — 1 nodes in the graph. 
In order to achieve this, we find the optimal partitioning 
of the graph nodes such that the cost of anonymizing 
the degree information within each group is minimum. 
We show that this reduces to a special case of a Gen¬ 
eralized Assignment Problem, and we propose a simple 
yet effective algorithm to solve it. Finally, we introduce 
an iterated linear programming approach to enforce the 
realizability of the anonymized degree sequences. The 
efficacy of the method is assessed through an extensive 
set of experiments on synthetic and real-world graphs. 


Introduction 

Interactions between users in an Online Social Network 
(OSN) can be abstracted using a graph representation. More 
complex dynamics, such as interactions over time or across 
multiple media are successfully captured by means of time- 
varying ( |Holme and Saramaki 2012 |) and multi-layer net - 
works respectively (IHristova, Musolesi, and Mascolo 2014 1 . 
Applications of these datasets include the analysis of struc¬ 
tural properties of social networks ( |Mislove et al. 2007] l, the 
investigation of the dynamics of information spreading in 
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social media ( Kwak et al. 2010| l, and the generation of per¬ 
sonalized recommendations for online systems ( [Andersen et| 
al. 2008|l. However, there is an increasing concern for the 


privacy implications related to the management, mining and 
distribution of these datasets. 

It has been shown that simply generating a random iden¬ 
tifier to label the nodes of the graphs does not guarantee pri¬ 
vacy ( jBackstrom, Dwork, and Kleinberg 2007| . In fact, an 
attacker may be able to identify the nodes of a graph simply 
by collecting information from external sources about their 
interactions. For example, if the attacker knows that the tar¬ 
get individual interacts with a certain number of other users 
in the network, and that number turns out to be unique for 
that individual, this piece of information alone is enough to 
identify the user among all the nodes of the network. While 
the problem of privacy preservation for digital data has been 
extensively studied in the literature ( jMeyerson and Williams 
2004[ Fung et al. 2010|l, the emergence of large graphs as a 


tool to model and analyze online social interactions has re¬ 
cently shifted research efforts to the problem of anonymiz¬ 
ing structural dat a (jBa ckstrom, Dwork, and Kleinberg 2 007 1 
[Hay et al. 2007[ [Hay et al. 2008[ [Liu and Terzi 2008| ). In 
particular, Liu and Terzi ( Liu and Terzi 2008| l provided the 
first algorithm to guarantee the construction of a fc-degree 
anonymous graph. The /c-anonymity model aims at ensuring 
that, given a structural query, at least k nodes in the graph 
satisfy the query. In particular, fc-degree anonymity guaran¬ 
tees that each node of the graph shares the same degree of at 
least k other nodes. 

Although many networks are naturally modeled as dy¬ 
namic systems, in most studies the temporal dimension is 
usually abstracted to produce an aggregated static graph. 
Despite giving an overall picture of the structure which still 
allows for some interesting analyses, much of the informa¬ 
tion is lost in the aggregation, and thus researchers have 
started to turn their attention to the analysis of the dynamic 
version of the graphs. However, this calls for novel anony¬ 
mization techniques that are able to cope with the additional 
longitudinal dimension. 


Inspired by the seminal work of Liu and Terzi (Liu and 
Terzi 20()8)l on anonymization of (single-layer) graphs. 


consider the problem of fc-degree anonymity in time-varying 
and multi-layer graphs. While most of the algorithms in 
the literature attempt to solve the fc-anonymity problem in 
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Figure 1: The proposed anonymization framework: given a time-varying graph. (1) the temporal degree sequence of each node 
is anonymized, (2) the resulting degree sequences are checked to ensure that each temporal slice of the anonymized graph is 
realizable (2), and finally the anonymized time-varying graph is constructed. Here the colors indicate the anonymity groups. 


single-layer graphs, we are interested in protecting the pri¬ 
vacy of users participating in a network, which has a struc¬ 
ture that evolves with time ( [Holme and Saramaki 201^ , 
i.e., a time-varying graph, or with multiple type of links 
associated with the same pair of users, i.e., a multi-layer 
graph ( [Mucha et al. 201^ . The structure of a multi-layer 
graph can be interpreted as that of a time-varying graph 
where each temporal slice corresponds to a separate layer 
and the order of the slices does not matter; in the remainder 
of the paper we will refer exclusively to time-varying graphs 
for simplicity. 

Note that a naive approach that enforces fc-degree anony¬ 
mity independently in each temporal slice is not sufficient 
to ensure fc-degree anonymity for the whole time-varying 
graph, as it is possible to decrease the level of anonymity 
k by observing the degree sequences of the nodes through 
time, i.e., their temporal degree sequences. Thus, we need to 
ensure that the temporal degree sequence of each node is in¬ 
distinguishable from that of at least k — 1 other nodes while 
preserving as much structure of the original time-varying 
graph as possible. Fig. [T] shows the pipeline of the proposed 
approach. Given a time-varying graph and a desired ano¬ 
nymity level k in input, a first module outputs a new set 
of anonymous degree sequences by solving Zi-norm mini¬ 
mization problem using a simple yet effective solution based 
on a variation of the k-means algorithm (Jain and Dubes 
1988 1. A second module ensures these sequences are real¬ 
izable ( Erdos 1960| l, i.e., that there exists a temporal slice 
with the given degree sequence, while a third and final mod¬ 
ule generates a A:-anonymous time-varying graph from the 
anonymized and realizable degree sequences. 

We conduct an extensive set of experiments on a number 
of real-world networks, and we show that it is possible to 
anonymize large graphs while minimizing the loss of struc¬ 
tural information. Moreover, we show that when the tem¬ 
poral slices are structurally correlated, i.e., successive slices 
show a similar structure, the complexity of the anonymiza¬ 
tion task decreases. To the best of our knowledge, this is the 
first work to investigate the problem of A:-degree anonymity 
in time-varying and multi-layer graphs. 

Related Work 


The concept of A:-anonymity in the graph domain ( [Hay et ah 
2007 [ Hay et al. 20081 was introduced by Hay et al., but it is 
only with Liu and Terzi ( Liu and Terzi 2008] l that a first al¬ 
gorithm to construct a fc-anonymous graph is proposed. As 
their algorithm is designed to work on static graphs, how¬ 
ever, if applied on the temporal slices of a time-varying 
graph it fails to take into account the additional informa¬ 


tion contained in the temporal dimension, i.e., the size of 
the anonymity groups in the temporal graph will be lower 
than that of the individual slices. Moreover, their technique 
generally requires repeated anonymizations of the graph un¬ 
der increasing levels of structural noise, something that is 
not computationally feasible when dealing with large time- 
varying graphs. A number of successive works proposed 
heuristics to reduce the total running time, thus making it 
feasible to anonymi ze large static social networks (|Lu, So ng, 
and Bressan 2012] [Casas-R om a, He rrera-Joanc omartfrana 
Torra 201^ Hartung, Hoffmann, and Nichterlein 20141. 


Chester et al. ( [Chester et al. 201^ considered a scenario 
in which the level of privacy concern of the different nodes 
of a network varies, i.e., only a subset of nodes of the net¬ 
works is anonymized. Other researchers, on the other hand, 
focused on stricter definitions of fc-anonymity, where the 
amount of structural information available to the attacker 
ranges from the immediate neighborhood of a node to the 
whole graph structure (j Hav£td^^0^ |Zhou_and^ei_2008 


Zou, Chen, and Ozsu 2009 Cheng, Lu, and Liu 2010 
Zhou and Pei 201 Ijl. Howwer, it is worth noting that the 


more structural information we take into account during the 
anonymization process, the more noise we need to add to the 
original graph, and the less informative the resulting anony¬ 
mized graph will be. 

Some researchers have also starte d investigating the ano- 
nymization of time-varying graphs (Zou, Chen, and (Dzsu 


|2009l|B hagat et ^1. 2010lpVledforth and Wang 201 Ij l. Zou et 

al. (Zou, Chen, and Ozsu 2009 > considered the problem of 


constructing fc-automorphic graphs, i.e., graphs where each 
vertex v cannot be distinguished for fc — 1 symmetric ver¬ 
tices given any structural information. The authors also pro¬ 
posed a way to account for graphs that are periodically re¬ 
published by replacing the nodes IDs with generalized ver¬ 
tex IDs. These are designed in a way that makes it impos¬ 
sible for the attacker to link the structural information of 
nodes across different temporal slices. However, this comes 
at the cost of being unable to trace a node along the tempo¬ 
ral dimension, thus hindering the analysis of the anonymized 
network. Bhagat et al. ( jBhagat et al. 201()| extended t he list- 
based anonymization scheme of ( Bhagat et al. 2009] l to dy¬ 
namic graphs by grouping the labels of the graph nodes as 
to maximize nodes and edges anonymity while minimizing 
structural information loss. Medforth and Wang ([Medforth 


and Wang 20111 also studied dynamic graphs, but instead 


of considering a passive attack as in Zou et al. (Zou, Chen, 


and Ozsu 20091, they accounted for an attacker that can ac¬ 


tively influence the degree of the target node by interacting 
with the network. In contrast with these approaches, our aim 
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Figure 2; Although each slice Gt satisfies 2-degree ano¬ 
nymity, the temporal degree vectors of the nodes are 
[2,2],[2,2] and [1,1], and thus the time-varying graph 
does not satisfy 2-degree anonymity. 


is to provide a method for anonymizing the node degrees 
of sequences of graphs so that the label of a node remains 
fixed during time and the evolution of interactions of spe¬ 
cific nodes can be traced and analyzed. 

Problem Definition 

Let Q = {Gi, • • • , Gt} denote a time-varying graph over a 
fixed set of vertices V, with \ V\ = n. That is, t/ is a sequence 
of undirected and unattributed graphs Gt{V,Et), with t = 
1, • • • , T, where Et is the set of edges active at time t. We 
define the n x T degree matrix D — {da}, where du is 
the degree of the f-th node of Gt, and we call the vector 
di. = [dii, • • • , dix] the temporal degree vector, or temporal 
degree sequence, of the f-th node. Let us also denote with 
d-t = [dit, • • • , dnt] the degree sequence of the t-th slice. 

Given an arbitrary degree sequence d.t, a typical problem 
is to find a simple graph Gt such that its degree sequence 
is d.t, where an undirected graph is called simple if it has 
no self loops and has no more than one edge between two 
vertices. If such a graph exists, the degree sequence is called 
realizable. More formally, Erdos and Gallai provide the fol¬ 
lowing necessary and sufficient condition ( |Erd6s 1960) 1, 

Definition 1. A degree sequence d.t, such that du > ■ • • > 
dnt, A realizable if and only even and for each 

^ "E j n it holds that 


Eor example, the vector v = [1,1,1,2, 2] is 2-degree anony¬ 
mous. However, note that the sum of its entries is odd, 
and thus it does not correspond to any fc-degree anony¬ 
mous graph. On the other hand, a complete graph with four 
nodes is 4-degree anonymous, and its degree sequence is 
rf.t = [3,3,3,3]. 

Let the nxT matrix V denote a set of n vectors of length 
T. We say that 14 is a set of A:-anonymous vectors if for each 
row Vi., there are at least k — 1 vectors Vj. such that = 
Vjt, for each t = 1, ■■ ■ ,T. We define a k-degree anonymous 
time-varying graph as follows: 

Definition 2. A time-varying graph Q is k-degree anony¬ 
mous if its degree matrix D defines a set of k-anonymous 
vectors. 


Note that simply requiring that each temporal slice Gt 
is /c-degree anonymous is not sufficient to ensure /c-degree 
anonymity in the time-varying graph, as Eig.j^shows. 

It should be clear that, independently from the value of k 
and the structure of Q, there always exists a solution to this 
problem. In fact, the time-varying graph where each Gt is 
either completely connected or completely disconnected is 
fc-degree anonymous, for any 1 < fc < n. However, such a 
solution is far from being optimal, in the sense that, in order 
to anonymize the graph, we need to introduce a large amount 
of structural noise that inevitably obfuscates the characteris¬ 
tics of the original graph that we aim to preserve. Recall that 
the edit distance between two graphs is defined as the least- 
cost edit operations sequence that transforms a graph into 
another one ( Bunke 1997| l. Hence, the optimal solution is to 


look for the fc-anonymous graph Q such that the edit distance 
between Q and Q is minimized. 

We propose to approximate this problem as follows. We 
first look for the fc-degree anonymous degree matrix D such 
that the distance 


1 -s. \ 

dist{D,D) =-'^Wdi. - di.\\i (2) 


3 n 

'^dit < j{j - 1) + X! nrin(dj4, j) (1) 

i=l 

Note that in this paper we will work only with undirected 
and unattributed graphs, i.e., the adjacency matrix of each 
Gt is symmetric and binary. Our approach can be extended 
to deal with directed edges by solving two separate anony¬ 
mization problems, one for the in-degree and one for the 
out-degree, and ensuring that the resulting degree sequences 
are realizable ([Erdos, Miklos, and Toroczkai 2010|l. 


Temporal Graph Anonymity 

Our goal is to create an anonymized version of a time- 
varying graph Q such that each node is indistinguishable 
from k — 1 other nodes based on its temporal degree vector. 
Recall that a vector of natural numbers v is k-anonymous if, 
for every entry Vi, there exist at least k—1 entries Vj with the 
same value. Based on this definition, Liu and Terzi (|Liu and] 


Terzi 20081 introduce the concept of a k-degree anonymous 


graph, i.e., a graph whose degree sequence is fc-anonymous. 


is minimized, where di. is the temporal degree sequence 
of the z-th node of Q, and l|2;||l denotes the li norm of 
the vector x. Then, we construct the anonymized graph Q 
with degree matrix D such that the structure of the origi¬ 
nal graph and its anonymized counterpart are as similar as 
possible. Note that Eq. [^defines a lower bound on the edit 
distance, i.e., there is no Q with degree matrix D such that 
its edit distance from Q is smaller than dist{D, D). Thus, 
we try to minimize the edit distance by first looking for a 
/c-anonymous degree matrix D that is as close as possible to 
the original one in the li sense, i.e., a minimizer of the lower 
bound, and then building a graph with degree matrix D such 
that the edge overlap with the original graph is the largest. 

In the next section we show that an optimal solution D 
can be found by solving a particular type of generalized as¬ 
signment problem. However, as we will see, the solution of 
this problem is not guaranteed to define a set of realizable 
degree sequences, and thus we will need an additional mech¬ 
anism to make sure that the anonymized degree sequences of 


















the temporal slices are all realizable. It is worth noting that 
we allow simultaneous edge additions and deletions. This 
has been shown to yield a better approximation of the orig¬ 
inal graph than the case where only edge additions are al¬ 
lowed ( |Liu and Terzi 20d8| ). 

Anonymization Framework 

Recall that our goal is to partition the graph nodes into 
groups of size at least k, such that each node of a group 
shares the same temporal degree vector. In addition to this, 
we want to ensure that a minimal number of structural 
changes is needed to create these groups. More formally, we 
are looking for a grouping of the nodes such that the sum of 
the li distances between the temporal degree sequences of 
the original and the anonymized graph is minimized. This 


is in general a non-convex and NP-hard problem (Meyerson 
[and Williams 20041 1. 

Enforcing Anonymity 

We propose to solve an approximation of the above prob¬ 
lem based on a variation of the k-means algorithm ( |Jain and| 
Dubes 1988| l in a Zi metric space. The fc-means algorithm 
is a two-step method to cluster points in a I 2 metric space. 
The objective of fc-means is to minimize the squared devi¬ 
ations from the group centroids, which is equal to the aver¬ 
age pairwise squared I 2 distances between the points and the 
centroid of the cluster. Given an initial set of k centroids, the 
algorithm proceeds by alternating an assignment step, where 
the points are assigned to the closest centroid, and an update 
step, where the new centroids of the clusters are computed. 
In particular, if the points lie in I 2 space the centroid of a 
cluster is defined as the mean of the points belonging to it. It 
is important to note that A:-means can transform a potentially 
non-convex problem into two convex sub-problems, namely 
the assignment and the update steps, for which a globally 
optimal solution can be found. 

In contrast with fc-means, here we need to minimize the 
average absolute deviation of the original temporal degree 
sequences from the temporal degree sequences of the ano¬ 
nymized graph. In other words, in our case the centroid of a 
group is defined as the generalized set median, i.e., the point 
that minimizes the li distance from the points of the group. 
Let m = denote the number of anonymity groups. We 
start by defining a random partition of the n nodes into m 
groups, and we compute the m x T matrix P — pij whose 
rows are the groups medians, i.e., pi. is the set median de¬ 
fined by the dj. assigned to the f-th group. 

The assignment step, on the other hand, requires solv¬ 
ing a Generalized Assignment Problem (GAP) with lower 
bounds (Hamada, Iwama, and Miyazaki 2011 Krumke and 


Thielen 2013| ). In fact, with respect to the standard ver¬ 
sion of fc-means, we have the additional constraint that each 
group has to hold at least fc members in order to guaran¬ 
tee fc-anonymity. In the classical version of GAP, the goal 
is to find an optimal assignments of n items to m bins, 
such that each bin cannot contain more than a fixed amount 
of items, and the assignment of an item to a bin is asso¬ 
ciated with a certain cost. In our case the size of an item 


is 1, and the problem is also known as Seminar Assign- 
ment Problem (SAP) ( Ha mada, Iwama, and Miyazaki 2011 [ 
|Krumke and Thiele n 2013)1. Both GAP and SAP are known 
to be NP-hard ( Krumke and Thielen 2013| ). However, when 
the number of bins is fixed, both problems can be written 
as linear programs and an optimal solution can be found 
in polynomial time using a standard linear program solver, 
such as the simplex method or interior point methods. The 
solution of the linear program assigns fc optimal nodes to 
each group, while the n — mfc residual nodes need to be 
assigned separately. In other words, we are left with an un¬ 
constrained assignment problem, where we can assign the 
residual nodes greedily, i.e., each residual node i to the li 
closest median j. We refer to the algorithm solving the as¬ 
signment step as OptimalAssignment. 


Given the optimal assignment, the update step consists 
in calculating the new medians of the clusters. We iterate 
these steps until convergence, i.e., until the assignment ma¬ 
trix does not change or a user-defined maximum number of 
iterations is met. Since the algorithm will find a local min¬ 
imum of the cost function that depends on the initial ran¬ 
dom partition of the nodes, we repeat the whole procedure a 
number of times and we select the minimum cost solution. 
In the remainder of the paper we refer to this as the De- 
GREEAnonymization algorithm. Finally, note that while 
the original fc-anonymity problem was non-convex and NP- 
hard, here we solve two convex sub-problems for which a 
global optimum can be found. 


Anonymizing Very Large Graphs In order to handle 
large time-varying graphs, for example describing the so¬ 
cial interactions between the users of an OSN, we need a 
fast and efficient way to solve the GAP problem in the as¬ 
signment step. Krumke and Thielend (|Krumke and Thi^ 


len 2013| proposed to map GAP to a minimum cost flow 


problem and using the Enhanced Capacity Scaling algorithm 
(ECS) to solve it (Krumke and Thielen 201311. More specifl- 
cally, the problem of assigning n nodes to m groups can be 
mapped on a flow network with \V\ = m + n nodes and 
\E\ = mn + m-\- n edges. However, the complexity of the 
ECS is 0{\E\ log(|V|)(|i;^| -f \V\ log(| V|)), which makes it 
unfeasible when applied to very large graphs. 


We propose to replace the OptimalAssigNMENT algo¬ 
rithm with a less computationally demanding heuristic. The 
pseudocode of GreedyAssignment is shown in Algo¬ 
rithm The algorithm starts by iterating over the set me¬ 
dians, i.e., the rows of P, in random order. For each median 
r, it computes the li distance from r to the temporal degree 
vectors in D. Then, it assigns to r the first fc nodes c that have 
not been previously assigned to another median. When the 
anonymity set is complete, i.e., fc nodes have been assigned 
to r, the next median is processed. The assignment proce¬ 
dure is repeated I times, each time starting with a different 
random permutation of the piS, and the minimum cost as¬ 
signment is returned. Note that the complexity of our heuris¬ 
tic is 0{lmn\og{n)), which makes it possible to apply it to 
very large networks, as opposed to the approach of Krumke 
and Thielend ( [Krumke and Thielen 2013) . 





































Algorithm 1: GreedyAssignment 


Input : A degree matrix D, a set median matrix P and a 
desired anonymity k 

Output : An optimal assignment matrix A 


1 for i t— 0 to / do 

2 P t— scramble the rows of P; 

3 A m X n all-zero matrix; 

4 foreach r G P.rows do 

5 d t— compute distance from r to P; 

6 nn t— sort nodes for increasing d\ 

1 foreach c G nn do 

8 if fc nodes have been assigned to r then 

9 ^ break; 


to 

11 


if c has not been assigned yet then 
L A{r,c) t- 1; 


12 


iterA[i\ ■«— A\ 


13 A t— select iterA[i\ with minimum cost; 


Enforcing Realizability 

While DegreeAnonymization will return a matrix D 
whose columns are fc-anonymous degree sequences, these 
are not guaranteed to be realizable.In Liu and Terzi ( |Liu and| 
Terzi 2008] ), when a fc-anonymous degree sequence is not 
realizable the authors propose to modify the original graph 
by adding uniform structural noise in the form of additional 
edges. The anonymization and noise addition are alternated 
until a realizable fc-anonymous degree sequence is returned, 
while the convergence is guaranteed by noting that, in the 
worst case, the repeated addition of edges will result in a 
complete graph, which is by definition A:-anonymous. 

However, while in Liu and Terzi ( |Liu and Terzi 200^ the 
problem is that of anonymizing a single unattributed graph, 
in this paper we intend to anonymize a time-varying graph. 
Not only having T different degree sequences to anonymize 
there is a higher probability of generating one which is not 
realizable, but it is also not possible to locally alter the struc¬ 
ture of the original temporal slices and the /c-anonymity 
group memberships, while ensuring that the A:-anonymity 
across the whole time-varying graph is preserved. For this 
reason, we decide to locally operate on the non-realizable 
degree sequences in the following way. 

Recall that a degree sequence d.t is realizable if 
is even and if it satisfies Eq.JT] Let us first focus on 
Eq. 0 Given a temporal slice Gt and a non-realizable fc- 
anonymous degree sequence d.t, we want to project d.t to 
the nearest fc-anonymous degree sequence d*t that satisfies 
this equation. The function that we want to minimize is the 
li norm between d.t and d*^. That is, our problem can be 
written as _ _ 

minimize 11 — d.t\\i 

subject to Ad*^ <b{d*.^) (3) 

<>0 

where A and b{d*.t) denote respectively the matrix of 


constraints and the vector of constant terms defined by 
Eq. 0 i.e., the y-th element of b{d*t) is j{j — l)-f 
min(fi*(, j). Note, however, that in this formula¬ 
tion of the problem we are allowing all the n components 
of d*.t to vary, and thus the fc-anonymity of d.t is not guar¬ 
anteed to be preserved. Instead, we propose to minimize 
||5'<5.t — S'(5*(||i, where S.t is the m-elements vector such 
that 5it is the degree of the nodes in the i-th fc-anonymity 
group, m denotes the number of groups and S is the n x m 
assignment matrix such that the element Stj = 1 if the i- 
th node belongs to the j-th group. This can be transformed 
into a linear program as follows. We first introduce the slack 
variables — x~ = S^t — S.t, where x~^,x~ > 0 and 
we rewrite the objective function as + 1^Sx~, 

where 1 denotes the all ones-vector. Moreover, recall that 
KS*t) = j{j - 1) + Er=j-ri ntim{6*t,j), and thus the con¬ 
straints are not linear. In order to linearize the constraints, 
we solve an iterated linear program where we fix an initial 
value b{6.t), and we alternate the computation of the optimal 
S*^ and b{Stt), until convergence. That is, during the Lth it¬ 
eration we solve the linear program 

minimize l^Sx+ -t- 1^5a:- 
subject to Ax^ — Ax~ < b{d*t) — AS.t (4) 
xf ,X~ >0 

where S^t is the optimal solution at the (i — l)-th iteration. 

While finding a solution for Eq. requires to solve an 
Integer Linear Program, we propose to solve an alternative 
problem where the matrix of constraints is totally unimod- 
ular and the feasible solutions are guaranteed to be integer¬ 
valued. Let us write A = LS, where L is the lower triangular 
matrix, i.e., such that Ltj = 1 if i > j, Ltj = 0 otherwise. 
Since L is invertible, we can rewrite the problem in Eq.j^as 

minimize + 1^ Sx 

subject to Sx^ — Sx~ < L~^b{S%) — SS.t (5) 

xt ,x~ >0 

Theorem 1. A solution of the linear program in Eq.^satis- 
fies Eq. 1^ 

Proof. We need to prove that a feasible solution for the prob¬ 
lem in Eq.|^is also feasible for the problem in Eq.|^ To this 
end, let us rewrite Eq.|^as 

minimize -I- 

subject to S'x"*' — Sx~ + L~^z = L~^b{5*t) — SS.t 
xf,x~,Zi > 0 

( 6 ) 

where we introduced the slack variables Zi > 0. By itera¬ 
tively updating the value of the x~s we can ensure that the 
inequality 

Sx+ - Sx- < L-^b{5:t) - SS.t (7) 

holds. As a consequence, we have that 

L-^z = biS:^) - SS.t - Sx+ +Sx- >0 (8) 















In order to show that a solution that satisfies Eq. will 
also feasible for the problem in Eq.|^ we need to prove that, 
whenever the former holds, we have z > 0. Let us introduce 
a slack variable y > 0 and rewrite Eq.j^as 

S'x+ -Sx- +y = L-^b{6:t) “ SS.t (9) 

Erom Eqs.[^and|^it follows that 

y = L-^z>0 ( 10 ) 

Einally, since y > 0 and z = Ly, z is a sum of non-negative 
values and thus z > 0. □ 

We propose a fast and effective pivot selection algorithm 
to find a feasible solution for the problem in Eq. Our goal 
is that of setting the values of xf and x~ as to satisfy the 
i-th inequality constraint, for all 1 < * < m. To start, we 
initialize and X as the all-zero vectors. Given the *-th 
constraint, note that we can not have xf > 0 and a;“ > 0 
at the same time. Thus, when the i-th constraint is violated, 
i.e., {L~^b{S%) — S{6.t -\- x^ — x~))i < 0, we set x~ so 
that the inequality is reversed, but we let xf = 0. In other 
words, we selectively reduce the degree of those groups that 
violate the constraints. More precisely, the degree of a group 
is reduced by an amount proportional to the total magnitude 
of the violated constraints for that group. We then propagate 
the reduction to the remaining group so as to maintain the 
order of the degree sequence. We omit the pseudocode of 
the EnforceRealizability algorithm due to space con¬ 
straints. 

Let cT.i be the complete degree sequence output by En- 
eorceRealizability. As a last step to ensure that d*^ 
is a realizable degree sequence, we need to make sure that 
Sr=i ^it even. To this end, we pick the smallest group 
with odd degree sum, and we either increase or decrease the 
degrees of each of its members by 1. More specifically, we 
pick the operation (increase or decrease) that yields a feasi¬ 
ble solution with minimal li distance from the original de¬ 
gree sequence. 


Graph Construction 

With the anonymous and realizable degree matrix to hand, 
we can proceed to construct the anonymized time-varying 
graph. More specifically, we build each temporal slice in¬ 
dependently using the relaxed graph construction method of 
Liu and Terzi ( |Liu and Terzi 2008] l. Given a realizable de¬ 
gree sequence d%, we use the PRIORITY algorithm to build a 


graph Gt, such that the edge overlap with the original graph 
is as large as possible. The PRIORITY algorithm creates a de¬ 
gree anonymous graph with a high edge intersection with the 
original graph by prioritizing the construction of edges be¬ 
tween vertices that share an edge in the original graph. Note 
that in ( Liu and Terzi 200^ when the input sequence is not 
realizable the PRIORITY algorithm needs to call the PROB¬ 
ING procedure. This in turn perturbates the structure of the 
original graph and calls again PRIORITY and attempts to cre¬ 
ate a new anonymous and realizable degree sequence. Here, 
instead, we are guaranteed that the input sequence is realiz¬ 
able, and thus there is no need to run DegreeAnonymiza- 
TION again. 


Experiments 

We evaluated our framework on a number of real-world and 
synthetic datasets. As the final graph is constructed using the 
graph construction method of Liu and Terzi ( |Liu and Terzi | 
|2008| l, we focus most of our evaluation on the heuristics 
proposed to compute the anonymous matrix D from to the 
original matrix D. More specifically, the quality of our solu¬ 
tion is measured in terms of the normalized cost C{D, D) = 

Tn(n-i\ ’ results are presented in terms 

of average normalized cost (± standard error) over 20 repeti¬ 
tions. Also, recall that both DegreeAnonymization and 
GreedyAssigNMENT depend on a number of parameters. 
Unless otherwise stated, we set the number of iterations I in 
Algorithm [T] to 10 and we allow a maximum of 50 iterations 
before convergence in DegreeAnonymization. 


Data 

The MIT Social Evolution dataset ( |Madan et al. 2012] i con¬ 
sists of 5 layers representing different types of social con¬ 
nections between 84 students, for a total of 7, 055 edges. 
The layers represent respectively the connections between; 
1) close friends, 2) students that participated in at least two 
common activities per week, 3) discussed politics at least 
once since the last survey, 4) shared Eacebook photos, 5) 
shared blog/Live Journal/Twitter activities. 

The Enron dataset ( |Shetty and Adibi 2005| l consists of 
the time-varying network representing e-mail exchanges 
between 151 users during the period from May 1999 to 
June 2002 (1,146 days). We consider three alternative ver¬ 
sions of this graph, where the slices represent the activity 
over a month, week and day, respectively. We will refer to 
these three graphs as Enron Month (7,277 edges over 38 
months), Enron Week (13,080 edges over 164 weeks) and 
Enron Day (21, 257 edges over 1,146 days). 

The Irvine dataset (Opsahl and Panzarasa 20091 repre¬ 


sents the social connections between 1, 899 users of an on¬ 
line students community at University of California, Irvine. 
The data consists of 20, 296 interactions over a period of 51 
days. 

Einally, the Yahoo datasej^ is a collection of 28 days of 
Yahoo Instant Messenger events, where each node is an IM 
user, and each link represents a communication event on a 
given day. This is the largest dataset considered in our study, 
with a total of 100,000 nodes interacting over 28 tempo¬ 
ral slices. We consider also a reduced version of this graph 
where we select 10,000 through a bread-first exploration of 
the largest connected component of the aggregated graph 
over the 28 days. We refer to the two versions of the Ya¬ 
hoo graph as Yahoo 10^ (139,524 edges) and Yahoo 10® 
(2, 026, 734 edges). 

In addition to these real-world datasets, we add a set of 
synthetic time-varying graphs where the temporal duration 
of an edge is sampled from a geometric distribution with 
parameter 6, i.e., 6 represents the probability of an edge to 
change from being absent (present) to being present (ab¬ 
sent). By varying 6 we can control the average temporal cor- 
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Figure 3: Average anonymization cost (top) and average number of iterations (bottom) needed to converge by DE¬ 
GREE ANONYMIZATION using GreedyAssIGNMENT and the LP solver, for varying values of k. 


relation (Clauset and Eagle 2007 1 of the time-varying graph, 
where the temporal correlation of a graph measures the over¬ 
lap between successive temporal realizations of the nodes 
neighborhoods. Thus, sampling from a geometric distribu¬ 
tion with a high 9 will generate unstable time-varying graphs 
where edges constantly appear and disappear, i.e., graphs 
with a low temporal correlation. On the other hand, when 
0 —?► 0 the probability of a structural change is close to zero, 
i.e., the time-varying graph shows a high temporal correla¬ 
tion. Given a value of 9, we then generate a time-varying 
graph by sampling 10 temporal slices over 100 nodes. In 
total, we generate 11 time-varying graphs with increasing 
average temporal correlation. 


Degree Anonymization 

As a first experiment, we evaluate the efficiency of 
DegreeAnonymization and in particular of our 
GreedyAssigNMENT heuristic. We compare it with an 
optimal assignment of the nodes to the anonymity groups 
obtained by solving the assignment step with a standard 
LP solver. In these experiments we use the revised simplex 
algorithm implemented in the GNU Linear Programming 
Kit (GLPK), version 4.3^ and we compare the solution 
found by the LP solver to that of GREEDYASSIGNMENT. 
GLPK is a free and open source software that is designed 
to solve large-scale linear programs. However, note that in 
these experiments we make use only of the MIT, Enron 
and Irvine datasets, as the LP solver was not able to scale 
to the size of the Yahoo datasets. Note in fact that in this 
dataset when k = 2 the number of anonymity groups is 
m = 50, 000, for a total of 5 billion possible node-to-group 
assignments, i.e., the matrix of coefficients in the GAP has 
5 billion entries. 

^http://www.gnu.org/software/glpk/glpk.html 


Pig. shows the average normalized cost and the 
average number of iterations to convergence for De- 
GREeAnonymization, when either GreedyAssign- 
MENT or the LP solver are used to determine the optimal 
assignment. Our first observation is that in general in both 
cases as k increases the anonymization cost increases. This 
is not unexpected, as creating larger anonymity groups re¬ 
quires introducing an increasing amount of structural noise. 
However, we also note a slight drop in the cost for some val¬ 
ues k. This may be linked to the cost of assigning residual 
nodes through the AssignResidual procedure. Recall, in 
fact, that the number of residual nodes depends on the ano¬ 
nymity level k, and it can increase for some k 2 > ki. As an 
example, let us consider a graph with 12 nodes, where with 
fc = 4 there are no residual nodes to assign, whereas with 
fc = 3 there is 1 node left to assign. 

Pig. (top) also shows that our GREEDYASSIGNMENT 
heuristic generally achieves a good approximation of the 
anonymization cost with respect to the LP solver. In par¬ 
ticular, for k = 2 our heuristic consistently outperforms 
the optimal solution found using the LP solver. This in 
turn may be related to the problem of local minima for 
the simplex method. On the other hand, our greedy ex¬ 
ploration of the function landscape seems to lead De- 
GREEAnonymization to hnd a better local minimum. 
However, as the landscape gets more complicated, i.e., as k 
increases, the performance of our heuristic quickly deterio¬ 
rates and DegreeAnonymization with the GreedyAs- 
SIGNMENT heuristic performs consistently worse than with 
the LP solver, although the two costs remain generally close. 

Fig. ID (bottom) shows the average number of iterations 
needed to converge for DEGREEANONYMIZATION, when 
either GREEDYASSIGNMENT or the LP solver are used to 
determine the optimal assignment. Interestingly, over all the 




















































(a) Realizability 



(b) Temporal Correlation 



(c) Temporal Resolution 


Figure 4: (a) Empirical cumulative distribution function of the average anonymization costs for 1,000 non-realizable degree 
sequences, (b) Average anonymization cost on the synthetic datasets for increasing levels of average temporal correlation, (c) 
Average anonymization cost on the Enron Month dataset for varying values of k and varying temporal resolution. 


3 datasets for fc = 2 we see that DegreeAnonymization 
with the LP solver quickly converges to a non-optimal local 
minimum, while GreedyAssigNMENT leads to a slower 
convergence and a better local minimum. As k increases, 
DegreeAnonymization tends to reach convergence in 
fewer iterations when GreedyAssigNMENT is used. Al¬ 
though Eig.[^ shows that this leads to a slightly higher ano¬ 
nymization cost, Eig. shows that this is compensated for 
by a faster convergence. Moreover, it should be noted that 
a single run of GREEDYASSIGNMENT is considerably faster 
than a single run of the LP solver. 


Degree Sequences Realizability 


We now evaluate the EnforceRealizability algorithm. 
More specifically, we are interested in comparing the so¬ 
lutions obtained solving Eq. with those of Eq. Recall 
in fact that the feasible solutions of Eq. although feasi¬ 
ble also for Eq. are only a subset of those. We generate 
1,000 random degree sequences over 10 nodes, where each 
sequence is created such that it is A:-anonymous but not real¬ 
izable, for a random level of anonymity k. The reason why 
we resort to synthetic data is that in our experiments we ob¬ 
serve that, when we apply DegreeAnonymization on 
real-world data, the anonymized degree sequence of each 
slice is almo st always realizable, a behaviour that was also 
observed in ( Liu and Terzi 200^ This may be due to the 
fact that the original degree sequences are indeed realizable, 
and thus it may be more likely that the anonymized degree 
sequences are also realizable. 


Eig. 4(a) shows the empirical cumulative distribution 


function of the normalized costs of the degree sequences ob¬ 
tained by solving the Integer Linear Program of Eq.ffland the 
linear program of Eq. where the solution of Eq. |^is com¬ 
puted using the EneorceRealizability algorithm. It is 
interesting to note that the solutions found by EnforceRe- 
ALIZABILITY are close to those found by solving the origi¬ 
nal NP-hard problem. 


^The sum of the degree sequence may still be odd, but this can 
be fixed easily without invoking EnforceRealizability. 


Temporal Correlation 


The average temporal correlation is defined between 0 and 
1 ( |Clauset and Eagle 2007[ ), where a value of 0 is achieved 
for completely anti-correlated temporal slices, a value of 
1 is achieved for completely correlated temporal slices, 
while non-correlation corresponds to a 0.5 value. Eig. 4(b) 
shows that, independently of k, when the temporal slices are 
strongly correlated the anonymization costs drops dramat¬ 
ically. In fact, the more homogeneous the structure of the 
slices is, the easier it is to define anonymity groups that will 
remain consistent on all the slices without introducing much 
noise. Note that also a strongly anti-correlated time-varying 
graph can be considered homogeneous, as the in the limit 
case, i.e., p = 1, the structure of the graph at time t is al¬ 
ways identical to that at time t ±2. 


Temporal Resolution 


Similarly to the temporal correlation, the temporal resolu¬ 
tion of the graph slices may also influence the complexity 
of the anonymization task. In fact, Eig. 4(c)| shows that, as 
we increase the temporal resolution, the average anonymi¬ 
zation cost becomes less dependent on k. More specifically, 
with a temporal resolution of 1 month, the cost of enforc¬ 
ing fc = 10 anonymity is about 50% higher than for k = 2, 
whereas with a temporal resolution if 1 week there is a 35% 
increase and with a 1 day resolution a 25% increase. In gen¬ 
eral, we can think that the higher the temporal resolution, 
the sparser the slices will be. A sparse graph is naturally 
anonymous, as a large number of nodes are completely dis¬ 
connected from the rest, i.e., they remain idle, and thus they 
are indistinguishable from each other. Thus, when enforcing 
anonymity across the temporal dimension, we have a higher 
degree of freedom when grouping the nodes in sparse slices 
while minimizing edit operations. In the limit case, if we 
consider a small enough time interval, some temporal slices 
become empty, i.e., we observe no interactions between the 
nodes during some periods. Here each node is indistinguish¬ 
able from the remaining n — 1 nodes of the graph, and no 
structural alteration is needed in these slices when aligning 
the anonymity groups across the longitudinal dimension. 


























Figure 5; Average cosine similarity between the PageRank vectors of the original temporal slices and the anonymized ones. 
The shaded area shows how the number of active edges varies with time. 


Dataset 

fc = 2 

fc = 5 

fc = 10 

MIT 

Enron (M) 
Enron (W) 
Enron (D) 

Irvine 

Yahoo 10"^ 
Yahoo 10® 

0.227 ± 0.009 

0.578 ± 0.015 

0.927 ± 0.027 

2.807 ± 0.113 

21.28 ± 0.265 

3, 524 ± 16.11 

Si 80 hours 

0.138 ±0.007 

0.358 ±0.008 

0.591 ± 0.021 

1.851 ± 0.047 

8.151 ± 0.285 
1,518± 16.29 

40 hours 

0.131 ± 0.006 

0.267 ± 0.006 

0.400 ± 0.012 

1.888 ± 0.033 

4.155 ± 0.107 

632.2 ± 14.28 

20 hours 


Table 1: Runtime evaluation (seconds). 

Impact on Graph Structure 

So far we have been evaluating our anonymization frame¬ 
work in terms of the normalized anonymization cost. How¬ 
ever, this gives us only a partial insight on the information 
loss that we incur when we anonymize a time-varying graph. 
In order to evaluate better the effects of the structural pertur¬ 
bation caused by the anonymization process, we evaluate the 
page PageRank ( |Page et al. \999\ of the anonymized time- 
varying graph. The PageRank vector is a measure of node 
importance commonly used in network analysis. We com¬ 
pute the PageRank vector of each temporal slice for both 
the original and the anonymized graphs. Fig. shows the 
cosine similarity ( |Jain and Dubes 1988| l between the PageR¬ 
ank vectors of the original and anonymized temporal slices 
as a function of time, over three different datasets. Note the 
shaded area showing the varying volume of interactions over 
time. When k is low, the PageRank centrality of the nodes 
is well approximated in the anonymized graph, suggesting 
that most of the structural information is retained. However, 
as the level of anonymity increases, more noise needs to 
be added and the centrality of the anonymized nodes starts 
to deviate from its original value. Interestingly, we observe 
that the cosine similarity remains high on the temporal slices 
where most of the interactions are concentrated. This should 
not come as a surprise, as sparser slices are more sensitive 
to the addition and removal of edges. 

Ruutime Evaluatiou 

We conclude this section with the runtime evaluation of our 
framework, as reported in Table Note that the code of 
DegreeAnonymization can be easily parallelized us¬ 
ing standard multiprocessing programming APIs such as 


OpenMiQ Thus, the runtimes are measured by executing 
the I outer iterations of DegreeAnonymization in paral¬ 
lel on a server equipped with two 6 cores Intel Xeon E5645 
(2.40GHz) HyperThreading enabled CPUs with a total of 24 
logical cores and 48GB of RAM. As we can see, the anony¬ 
mization of the Yahoo graph is the most expensive one in 
terms of time, as it took approximately 80 hours to create 
a fc = 2 anonymous graph. Recall, however, that although 
the number of nodes of the Yahoo graph is 100,000, we are 
effectively operating on a total of 100,000 x 28(days) = 
2,800, 000 million nodes. Another important observation is 
that the runtime does not grow linearly with the longitudinal 
dimension. While Enron Day has « 30 times the temporal 
slices of Enron Month, the runtime is only « 5 times higher. 
Einally, we should stress that our anonymization technique 
is aimed at preventing attacks on a time-varying graph that 
has been published in its entirety, and thus the computational 
time can be considered a less stringent constraint than the 
level of privacy that we can ensure. 


Discussion and Conclusions 


In this paper we have presented a novel framework for the 
anonymization of time-varying and multi-layer graphs. We 
have considered the case of an attacker that has access to the 
number of social ties of an OSN user over time or over mul¬ 
tiple online platforms. In order to protect the nodes’ iden¬ 
tity, we have proposed to perturb the structure of the time- 
varying graph so that the temporal degree sequence of each 
node become indistinguishable from that of at least other 
k — 1 nodes. To this end, we have introduced a variant of k- 
means in the li space with the additional constraint that each 
group needs to contain at least k nodes. We have also shown 
how to approximate the problem of making a degree se¬ 
quence realizable as an iterated linear program, and we have 
proposed a fast and effective algorithm to solve it. We have 
applied the proposed framework on a number of real-world 
and synthetic networks, and we have shown that the amount 
of edge insertions or deletions that we need to perform de¬ 
pends on the average temporal correlation (Clauset and Ea- 
gle 2007) l of the graph. In order to evaluate the structural 
information loss after anonymization is applied, we have 
compared the PageRank vectors of the original and anony- 
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mized temporal slices, and we have found that the amount of 
structural information that is preserved is higher in temporal 
slices that correspond to high activity periods. 

Note that in our framework there is a clear trade-off be¬ 
tween the computational complexity of the proposed algo¬ 
rithms and the quality of the local optimum we converge to. 
In other words, a lower runtime inevitably comes at the cost 
of an increased edit distance between the original graph and 
the anonymized one. We plan to investigate more efficient 
heuristics, both in terms of time complexity and quality of 
the solution. In particular, our aim is to modify the proposed 
framework to be able to scale up to even larger graphs. An¬ 
other interesting direction of research is the analysis of sce¬ 
narios in which the graph is only partially fc-anonymous, i.e., 
only a subset of the nodes satisfies A:-anonymity, or where 


the level of anonymity k varies across the nodes (Chester et 
|al. 2012^ . 
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